With many years of experiences in small domestic appliances and sustainable core values, ACME Company that compete with the major players on the European market ACME Companies provides the customers the unique user experiences, customer loyalty and reliable and high-quality services in the industry make us worth to be invested.
In Germany, the sales of the household appliance market are expected to grow from 2021 to 2025 at average annual growth rate of around 1.5%. In 2021, sales are expected to reach EUR 9.963 million. The global market competition is highly fragmented and divided between international- and regional players. Many regional players are struggling to compete against them on a cost basis, nonetheless ACME Company. So, the company changes their business model to customer-centric approach. However, the company struggles to improve it to retain their customer base.
ACME provide a solution to identify deviations in process performance. ACME extracted data from its ERP system.
First off all we need to install some important packages
install.packages("tidyverse",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("readr",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("edeaR",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("dplyr",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("ggplot2",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("lubridate",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("bupaR",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("processmapR",repos="http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("FactoMineR",repos="https://CRAN.R-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("factoextra",repos="https://CRAN.R-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("kohonen",repos="https://CRAN.R-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("cluster",repos="https://CRAN.R-project.org")
##
## The downloaded binary packages are in
## /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.1 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(edeaR)
##
## Attaching package: 'edeaR'
## The following object is masked from 'package:base':
##
## setdiff
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(bupaR)
##
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:utils':
##
## timestamp
library(FactoMineR)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(kohonen)
##
## Attaching package: 'kohonen'
## The following object is masked from 'package:purrr':
##
## map
library(cluster)
ACME_EventLog_data <- read_delim("EventLog_ACME_TeachingCase.csv", delim = ";", escape_double = FALSE, trim_ws = TRUE)
## Rows: 178078 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (4): CASE_ID, ACTIVITY, DEVICETYPE, SERVICEPOINT
## dbl (1): REPAIR_IN_TIME_5D
## dttm (1): TIMESTAMP
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(ACME_EventLog_data)
## spec_tbl_df[,6] [178,078 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ CASE_ID : chr [1:178078] "Case10" "Case10" "Case10" "Case10" ...
## $ ACTIVITY : chr [1:178078] "Creation" "Letter" "DeviceReceived" "StockEntry" ...
## $ TIMESTAMP : POSIXct[1:178078], format: "2018-01-02 13:39:47" "2018-01-05 00:00:00" ...
## $ REPAIR_IN_TIME_5D: num [1:178078] 0 0 0 0 0 0 0 0 0 0 ...
## $ DEVICETYPE : chr [1:178078] "AB52" "AB52" "AB52" "AB52" ...
## $ SERVICEPOINT : chr [1:178078] "E" "E" "E" "E" ...
## - attr(*, "spec")=
## .. cols(
## .. CASE_ID = col_character(),
## .. ACTIVITY = col_character(),
## .. TIMESTAMP = col_datetime(format = ""),
## .. REPAIR_IN_TIME_5D = col_double(),
## .. DEVICETYPE = col_character(),
## .. SERVICEPOINT = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
ACME_EventLog_data %>% head(n = 10)
## # A tibble: 10 x 6
## CASE_ID ACTIVITY TIMESTAMP REPAIR_IN_TIME_… DEVICETYPE SERVICEPOINT
## <chr> <chr> <dttm> <dbl> <chr> <chr>
## 1 Case10 Creation 2018-01-02 13:39:47 0 AB52 E
## 2 Case10 Letter 2018-01-05 00:00:00 0 AB52 E
## 3 Case10 DeviceR… 2018-01-05 16:45:34 0 AB52 E
## 4 Case10 StockEn… 2018-01-17 00:00:00 0 AB52 E
## 5 Case10 InDeliv… 2018-01-17 00:00:00 0 AB52 E
## 6 Case10 NoteWor… 2018-01-17 07:37:19 0 AB52 E
## 7 Case10 Complet… 2018-01-17 09:34:32 0 AB52 E
## 8 Case100 Creation 2018-01-02 15:43:48 0 AB41 E
## 9 Case100 NoteHot… 2018-01-02 15:44:41 0 AB41 E
## 10 Case100 Letter 2018-01-08 00:00:00 0 AB41 E
ACME_EventLog_data$TIMESTAMP <- ymd_hms(ACME_EventLog_data$TIMESTAMP, tz = "CET")
sum(is.na(ACME_EventLog_data$DEVICETYPE))
## [1] 20
ACME_EventLog_data$CASE_ID <- as.factor(ACME_EventLog_data$CASE_ID)
ACME_EventLog_data$ACTIVITY <- as.factor(ACME_EventLog_data$ACTIVITY)
ACME_EventLog_data$REPAIR_IN_TIME_5D <- as.factor(ACME_EventLog_data$REPAIR_IN_TIME_5D)
ACME_EventLog_data$DEVICETYPE <- as.factor(ACME_EventLog_data$DEVICETYPE)
ACME_EventLog_data$SERVICEPOINT <- as.factor(ACME_EventLog_data$SERVICEPOINT)
ACME_EventLog_data$Lifecycle <- as.factor("Start") # Extra Variable
ACME_month <- ACME_EventLog_data %>%
group_by(CASE_ID) %>%
arrange(as.numeric(gsub("Case","", CASE_ID)),TIMESTAMP) %>% # sort rows in ascending order on "CASE_ID" and then TIMESTAMP value CASE_ID is string contained number so we use here small trick to sort it
filter(row_number() == 1) %>% # filter just one per case
mutate(Month = floor_date(TIMESTAMP, "month")) %>% # create new varable
select(CASE_ID,Month) # select column "CASE_ID" and "Month"
head(ACME_month,n = 10)
## # A tibble: 10 x 2
## # Groups: CASE_ID [10]
## CASE_ID Month
## <fct> <dttm>
## 1 Case2 2018-01-01 00:00:00
## 2 Case3 2018-01-01 00:00:00
## 3 Case4 2018-01-01 00:00:00
## 4 Case5 2018-01-01 00:00:00
## 5 Case6 2018-01-01 00:00:00
## 6 Case8 2018-01-01 00:00:00
## 7 Case9 2018-01-01 00:00:00
## 8 Case10 2018-01-01 00:00:00
## 9 Case11 2018-01-01 00:00:00
## 10 Case13 2018-01-01 00:00:00
ACME_EventLog_data <- ACME_EventLog_data %>%
left_join(ACME_month, by = "CASE_ID") %>%
arrange(as.numeric(gsub("Case","", CASE_ID)),TIMESTAMP)
ACME.log <- ACME_EventLog_data %>%
mutate(activity_instance = 1:nrow(.)) %>% #We add a activity_instance column with random number to meet requirements
eventlog(
case_id = "CASE_ID",
activity_id = "ACTIVITY",
activity_instance_id = "activity_instance",
lifecycle_id = "Lifecycle",
timestamp = "TIMESTAMP",
resource_id = "DEVICETYPE"
)
ACME.log %>% summary
## Number of events: 178078
## Number of cases: 23906
## Number of traces: 2720
## Number of distinct activities: 13
## Average trace length: 7.449092
##
## Start eventlog: 2013-05-22 10:39:39
## End eventlog: 2019-06-28 08:39:30
## CASE_ID ACTIVITY TIMESTAMP
## Length:178078 Creation :23907 Min. :2013-05-22 10:39:39
## Class :character DeviceReceived:23906 1st Qu.:2018-06-11 09:41:52
## Mode :character InDelivery :23906 Median :2018-10-31 10:17:36
## Completed :23890 Mean :2018-10-16 14:42:48
## NoteHotline :17764 3rd Qu.:2019-02-23 10:12:57
## Letter :16453 Max. :2019-06-28 08:39:30
## (Other) :48252
## REPAIR_IN_TIME_5D DEVICETYPE SERVICEPOINT Lifecycle
## 0:120029 AB52 :26726 E :63851 Start:178078
## 1: 58049 AB49 :15740 L :63549
## AB62 :13931 J :16178
## AB61 :11999 G :12832
## AB20 :10080 K : 1837
## (Other):99582 (Other): 3313
## NA's : 20 NA's :16518
## Month activity_instance .order
## Min. :2013-05-01 00:00:00 Length:178078 Min. : 1
## 1st Qu.:2018-05-01 00:00:00 Class :character 1st Qu.: 44520
## Median :2018-10-01 00:00:00 Mode :character Median : 89040
## Mean :2018-09-18 20:25:50 Mean : 89040
## 3rd Qu.:2019-02-01 00:00:00 3rd Qu.:133559
## Max. :2019-06-01 00:00:00 Max. :178078
##
case_count_per_month <- ACME.log %>%
distinct(Month, CASE_ID) %>%
group_by(Month) %>%
summarise(Case_count =n()) %>%
summarise(Month,Case_count, Anteil = Case_count/sum(Case_count)) %>%
arrange(Case_count)
case_count_per_month %>%
summarise(Total_case = sum(Case_count))
## # A tibble: 1 x 1
## Total_case
## <int>
## 1 23906
barplot(t(case_count_per_month[c('Case_count')]),
names.arg=t(case_count_per_month[c('Month')]),
las=1,
horiz=TRUE, #choose horizontal barchart
xlab="Number of cases",
main="Case count over time")
case_count_per_month %>%
ggplot(aes(x = as.Date(Month), y = Case_count)) +
theme_minimal() +
geom_line() +
scale_x_date(date_labels = "%d-%m-%Y") +
labs(title = "Number of cases over time",
x = "Time",
y = "Number")
As we can see here the number of cases went up over time and reached a peak in January 2019 and then declined
Repair_intime <- ACME.log %>%
distinct(CASE_ID,REPAIR_IN_TIME_5D,.keep_all = TRUE) %>% #Retain only unique value from an input
count(REPAIR_IN_TIME_5D, name ="count_repair") %>% #Count total cases which are done within 5 days and not
summarise(REPAIR_IN_TIME_5D,count_repair, Anteil = count_repair / sum(count_repair)) # put it together
The number of cases completed within 5 days was not impressed with just 34,65%
Activity_per_Case <- ACME.log %>%
distinct(CASE_ID,ACTIVITY) %>%
count(CASE_ID,name = "Activity_per_case") %>%
count(Activity_per_case,name = "Count")
Activity_per_Case %>%
ggplot(aes(x = reorder(Activity_per_case,Count), y = Count)) +
theme_minimal() +
geom_col() +
labs(title = "Most frequent Activities length per Case",
x = "Activities length per Case",
y = "Count")
A case contains in the most of time 6 to 9 activities
Activity_Frequency <- ACME.log %>%
activity_frequency(level = "activity") #this function creates absolute and relative frequency of each activity
Activity_Frequency
## # A tibble: 13 x 3
## ACTIVITY absolute relative
## <fct> <int> <dbl>
## 1 Creation 23907 0.134
## 2 DeviceReceived 23906 0.134
## 3 InDelivery 23906 0.134
## 4 Completed 23890 0.134
## 5 NoteHotline 17764 0.0998
## 6 Letter 16453 0.0924
## 7 NoteWorkshop 16010 0.0899
## 8 Approved 9708 0.0545
## 9 Transmission 9266 0.0520
## 10 StockEntry 8726 0.0490
## 11 StatusRequest 4527 0.0254
## 12 FreeticketComp 13 0.0000730
## 13 FreeticketCust 2 0.0000112
Here we can see that almost case will be completed from “creation” activity
Activity_Frequency %>%
ggplot(aes(x = reorder(ACTIVITY,absolute), y = absolute)) + #sort bar
theme_minimal() +
geom_col() +
coord_flip() +
labs(title = "Most frequent Activities",
x = "Activity",
y = "Count")
Resource_Frequency <- ACME.log %>%
distinct(CASE_ID,.keep_all = TRUE) %>%
filter_resource_frequency(perc = 1) %>% #this function creates absolute and relative frequency of resource
resources()
Resource_Frequency %>%
ggplot(aes(x = reorder(DEVICETYPE,absolute_frequency), y = absolute_frequency)) +
theme_minimal() +
geom_col() +
scale_x_discrete(guide = guide_axis(angle = 90)) +
labs(title = "Most frequent Resources",
x = "Resources",
y = "Count")
This chart delivers a number of devices which are required repair the most
ACME.log %>%
filter_trace_length(percentage = 1) %>% #this function delivers distribution of trace length(Number of activity per trace)
trace_length()
## min q1 median mean q3 max st_dev iqr
## 4.000000 6.000000 7.000000 7.449092 8.000000 13.000000 1.472449 2.000000
Average number of activity in one trace is around 7-8 activities
trace_coverage <- ACME.log %>% #delivers absolute frequency value of each trace
trace_coverage(level = "case") %>%
distinct(trace,absolute)
trace_coverage <- trace_coverage %>%
mutate(length = sapply(strsplit(trace_coverage$trace, ","), length)) #calculates trace's length
trace_throughput_time <- throughput_time(ACME.log, "trace") #calculates average throughput time of each trace
trace_explorer <- merge(trace_coverage,trace_throughput_time,by = "trace") #merger trace's frequency and throughput time
head(trace_explorer[,-c(1)],10)
## absolute length relative_trace_frequency min q1 mean
## 1 1 8 0.0000418305 43.045417 43.045417 43.045417
## 2 1 10 0.0000418305 37.128981 37.128981 37.128981
## 3 1 8 0.0000418305 6.935891 6.935891 6.935891
## 4 1 8 0.0000418305 5.843542 5.843542 5.843542
## 5 1 8 0.0000418305 47.909236 47.909236 47.909236
## 6 1 7 0.0000418305 52.520150 52.520150 52.520150
## 7 1 6 0.0000418305 27.929421 27.929421 27.929421
## 8 22 6 0.0009202711 8.847407 10.729068 18.830881
## 9 1 7 0.0000418305 16.675417 16.675417 16.675417
## 10 9 7 0.0003764745 2.868090 10.044618 13.300734
## median q3 max st_dev iqr total
## 1 43.045417 43.045417 43.045417 NA 0.000000 43.045417
## 2 37.128981 37.128981 37.128981 NA 0.000000 37.128981
## 3 6.935891 6.935891 6.935891 NA 0.000000 6.935891
## 4 5.843542 5.843542 5.843542 NA 0.000000 5.843542
## 5 47.909236 47.909236 47.909236 NA 0.000000 47.909236
## 6 52.520150 52.520150 52.520150 NA 0.000000 52.520150
## 7 27.929421 27.929421 27.929421 NA 0.000000 27.929421
## 8 18.099867 23.147127 36.718900 8.847696 12.418058 414.279387
## 9 16.675417 16.675417 16.675417 NA 0.000000 16.675417
## 10 11.142847 19.081192 27.141308 7.945907 9.036574 119.706609
trace_coverage %>%
mutate(Trace_Variant = 1:nrow(.)) %>%
head(trace_coverage,n = 80) %>% #we present the first 80 traces with the most coverage grade
ggplot(aes(x = reorder(Trace_Variant,absolute), y = absolute)) + #order value ascending
theme_minimal() +
geom_col() +
scale_x_discrete(guide = guide_axis(angle = 90)) +
labs(title = "Most frequent trace",
x = "Traces",
y = "Count")
80 traces cover almost all of traces
throughput_per_case <- throughput_time(ACME.log, "case") #delivers the throughput time of each case
throughput_per_month <- ACME.log %>% #delivers average throughput time of all cases from months
distinct(Month, CASE_ID) %>%
left_join(throughput_per_case, by = "CASE_ID") %>%
group_by(Month) %>%
summarize(Throughput.Avg = mean(throughput_time)) %>%
arrange(Month)
Average_throughput_per_case <- mean(throughput_per_case[,-c(1)]) # Average throughput time of input data
summary(throughput_per_case$throughput_time) # delivers distribution of throughput time
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3474 7.9895 15.0930 20.1811 24.9501 1803.9291
quantile(throughput_per_case$throughput_time, probs = seq(0.1, 1, 0.1)) # delivers quantile with 10% distance each
## 10% 20% 30% 40% 50% 60%
## 4.040179 6.963762 9.161169 12.794653 15.093003 18.910567
## 70% 80% 90% 100%
## 22.106551 27.984931 38.841794 1803.929132
ggplot(throughput_per_month, aes(x = Month, y = Throughput.Avg)) +
theme_bw() +
geom_line(size = 0.75) +
labs(x = "Month & Year", y = "Average Case Throughput in Days",
title = "Case Throughput Averages for Complete Log File")
head(throughput_per_month, nrow(throughput_per_month))
## # A tibble: 19 x 2
## Month Throughput.Avg
## <dttm> <dbl>
## 1 2013-05-01 00:00:00 1793.
## 2 2018-01-01 00:00:00 16.4
## 3 2018-02-01 00:00:00 17.5
## 4 2018-03-01 00:00:00 20.7
## 5 2018-04-01 00:00:00 19.3
## 6 2018-05-01 00:00:00 21.6
## 7 2018-06-01 00:00:00 19.1
## 8 2018-07-01 00:00:00 25.3
## 9 2018-08-01 00:00:00 24.9
## 10 2018-09-01 00:00:00 21.9
## 11 2018-10-01 00:00:00 19.5
## 12 2018-11-01 00:00:00 18.5
## 13 2018-12-01 00:00:00 22.6
## 14 2019-01-01 00:00:00 20.5
## 15 2019-02-01 00:00:00 20.4
## 16 2019-03-01 00:00:00 19.4
## 17 2019-04-01 00:00:00 19.6
## 18 2019-05-01 00:00:00 16.7
## 19 2019-06-01 00:00:00 9.86
activity_profile <- as.data.frame.matrix(table(ACME.log[c(1,2)])) # Activities occur in trace are noted with "1"
head(activity_profile,10)
## Approved Completed Creation DeviceReceived FreeticketComp
## Case10 0 1 1 1 0
## Case100 0 1 1 1 0
## Case1000 0 1 1 1 0
## Case10002 0 1 1 1 0
## Case10003 0 1 1 1 0
## Case10006 1 1 1 1 0
## Case10007 0 1 1 1 0
## Case10008 0 1 1 1 0
## Case10009 0 1 1 1 0
## Case1001 1 1 1 1 0
## FreeticketCust InDelivery Letter NoteHotline NoteWorkshop
## Case10 0 1 1 0 1
## Case100 0 1 1 1 1
## Case1000 0 1 1 0 1
## Case10002 0 1 0 1 1
## Case10003 0 1 1 1 1
## Case10006 0 1 1 1 0
## Case10007 0 1 0 1 1
## Case10008 0 1 0 1 1
## Case10009 0 1 1 1 1
## Case1001 0 1 1 1 0
## StatusRequest StockEntry Transmission
## Case10 0 1 0
## Case100 1 1 0
## Case1000 0 0 0
## Case10002 0 0 0
## Case10003 0 1 0
## Case10006 0 0 1
## Case10007 0 0 0
## Case10008 0 1 0
## Case10009 0 0 0
## Case1001 0 0 1
happy_path = head(trace_explorer[order(-trace_explorer$absolute),],1) # the trace with the most frequency will be happy path
print(paste("Haypy Path:",happy_path$trace))
## [1] "Haypy Path: Creation,DeviceReceived,Letter,Transmission,Approved,NoteHotline,InDelivery,Completed"
print(paste("Happy Path's absolute frequency:",happy_path$absolute, "time"))
## [1] "Happy Path's absolute frequency: 964 time"
print(paste("Happy Path average throughput time:",round(happy_path$mean), "day"))
## [1] "Happy Path average throughput time: 24 day"
ACME.log %>%
group_by_case() %>%
trace_explorer(coverage = 0.20, coverage_labels = c("absolute","relative", "cumulative")) #Coverage grade: 20%
## Warning: `rename_()` was deprecated in dplyr 0.7.0.
## Please use `rename()` instead.
#Delivers time frame where cases start
ACME.log %>%
filter_time_period(interval = ymd(c(20180101, 20181231)), filter_method = "start") %>%
dotted_chart
## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## Please use the `.add` argument instead.
## Joining, by = "CASE_ID"
## Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
## use `guide = "none"` instead.
ACME.log %>%
filter_activity_frequency(percentage = 1.0) %>%
filter_trace_frequency(percentage = .5) %>%
filter_endpoints(start_activities = "Creation", end_activities = "Completed") %>%
process_map(type = frequency("absolute"))
monthly_performance_profile <- merge(case_count_per_month,throughput_per_month,by = "Month")
head(monthly_performance_profile,10)
## Month Case_count Anteil Throughput.Avg
## 1 2013-05-01 3 0.0001254915 1793.37977
## 2 2018-01-01 1373 0.0574332803 16.43705
## 3 2018-02-01 1174 0.0491090103 17.50974
## 4 2018-03-01 1286 0.0537940266 20.71593
## 5 2018-04-01 1127 0.0471429767 19.28143
## 6 2018-05-01 1222 0.0511168744 21.61522
## 7 2018-06-01 1259 0.0526646030 19.08236
## 8 2018-07-01 1310 0.0547979587 25.32606
## 9 2018-08-01 1298 0.0542959926 24.88538
## 10 2018-09-01 1279 0.0535012131 21.85503
ACME.log %>%
filter_activity_frequency(percentage = 1.0) %>%
filter_trace_frequency(percentage = .5) %>%
filter_endpoints(start_activities = "Creation", end_activities = "Completed") %>%
process_map(performance(median, "days"))
means <- apply(activity_profile,2 ,mean) # Mean calculate
sds <- apply(activity_profile,2,sd) # standard deviation Calculate
nor <- scale(activity_profile,center=means,scale=sds+1) # normalization (standard deviation in some columns are equal 0 so we plus 1 to avoid NaN value, result could be light affected)
nor1 <-as.data.frame(nor)
distance = dist(nor,method = "euclidean") #Euclidean distance
set.seed(123)
g <- somgrid(xdim = 4, ydim = 4, topo = "rectangular" )
map <- som(nor,
grid = g,
alpha = c(0.05, 0.01),
radius = 1)
plot(map,type = "codes")
## Hierarchical agglomerative clustering
mydata.hclust = hclust(distance,method ="average")
plot(mydata.hclust)
wss <- (nrow(activity_profile)-1)*sum(apply(activity_profile,2,var))
for (i in 1:20) wss[i] <- sum(kmeans(activity_profile, centers=i)$withinss)
plot(1:20, wss, type="b", xlab="Number of Clusters", ylab="Number of cases")
set.seed(123)
kc<-kmeans(nor,10)
fviz_cluster(kc, data =nor[,-c(4,7)]) #we have to apply kmean clustering on the dataset without column "deviceReceived" and "InDelivery" because after data normalization there are only NaN/0 value in these two columns which prevent the visualization. The reason is that these activites occur in almost cases so we can't relly calculate its mean and its standard deviation is almost equal 0
ACME struggles with their customer services because it takes a lot of time to complete one customer’s requirement. In order to offer achieve the goals , ACME needs to reduce overall throughput time and their model complexity. We suggest that ACME should consider reduce the throughput time between following activities: Device-received, Letter, and Transmission.